1 Introduction

1.1 Background and motivation

Human resources are the most valuable asset in any country¹. They are the main reason behind the success or the failure of any organization. In fact, having an educated and competent manpower is the key driver to economic and social development. In this context, the importance of academic education has become undeniable. Therefore, it is crucial to invest money and time in order to study students’ academic performance and figure out effective ways to improve it.

Given the importance of the topic, it has been given particular attention in past research. In fact, many studies have been conducted in order to analyze the factors impacting students’ academic performance. While some studies focused on the psychological variables, such as Franck Amadieu & André Tricot’s research², other researchers have been interested in the impact of other elements such as mobility ³, gender and other socio-economic factors on students’ academic success.

Many reasons motivated us to choose this topic of research. In fact, as students, we are very passionate about the educational field. Thus, we want to provide through this project a detailed analysis that can be used as a reference guide for leaders working in the educational field. Mainly, we want to help schools and universities to have a better understanding of the factors influencing students’ academic performance in order to improve their decision-making processes, students’ success rate and eventually their overall organization.

Source: ¹ Gestion des ressources humaines,Jean-Marie Peretti, 2004. ² Psychological factors which have an effect on student success ,2015. ³ La migration pour études : Regards d’intervenants sur l’accueil et l’intégration des nouveaux étudiants »,2009.

1.2 Project objectives

The aim of the project is to understand the evolution of secondary academic performance in France. Our study will mainly focus on 3rd grade students (equivalent to 11th grade in Switzerland) and their results on the Diplôme National du Brevet (DNB) by school.

First, we will observe whether there are improvements or, on the contrary, deterioration in admissions of DNB over the years. From this dataset, we will also make comparisons, particularly at the geographical level, and an analysis of the success rate in terms of distinction for each school.

Then, we will try to understand if there is a correlation between academic success and some socio-economic factors, such as the type of accommodation, the single-parent families rate, and the involvement of schools in students’ physical and sports practice. Finally, despite these factors, we will investigate whether the COVID-19 pandemic has had a direct negative impact on students’ school performance.

1.3 Research questions

  • What is the evolution of student performance over time and across the different regions/departments of France?

  • Do socio-economic factors such as the type of accommodation, family situation or college policies have an influence on student success ?

  • Has the COVID-19 pandemic impacted student performance?

2 Data Sets

2.1 dnb_results

This dataset presents the results of the “diplôme national du brevet” by school, for schools in metropolitan France and for the overseas departments and regions. This data set contains 139’580 observations.

Variable Meaning
session Year of the exam session
school_id School identification number
school_type School type divided in six categories: COLLEGE, LYCEE PROFESSIONNEL, LYCEE, EREA, CFA, and AUTRE
establishment_name Name of the establishment
education_sector Education sector categorised as public or private
municipality_code Municipality code
municipality Name of the municipality
department_code Department code. It is to be noted that France has 101 departements.
department Name of the department
academy_code Academy code
academy_name Name of the academy
region_code Region code. It is to be noted that France has 18 administrative, regions
region Name of the region
registered Registered candidates
present Candidates present for the exam
admitted Candidates admitted
admitted_without Candidates admitted without distinction
admitted_AB Candidates admitted with distinction “Assez Bien”
admitted_B Candidates admitted with distinction “Bien”
admitted_TB Candidates admitted with distinction “Très bien”
success_rate “Success rate [Present]/[Admis] as a percentage”

Source of the data set

2.2 Generation 2024 data set

This data set gathers all schools which have been awarded the “Generation 2024” label. The objective of this label, developed in view of the Paris 2024 Olympic Games, is to develop bridges between the school world and the sports movement in order to encourage young people to take part in physical activity and sport. This data set contains 6’883 observations.

Variable Meaning
region Name of the region
academy Name of the academy
department Name of the department
municipality Name of the municipality
establishment Name of the establishment
school_id School identification number
school_type School type
education_sector Education sector categorised as public or private
postcode Postcode
adress Address of the establishment
adress_2 Additional address of the establishment
mail E-mail address of the establishment
students Number of students in the school
priority_education Indicates whether the school is located in a priority education network (REP) or a reinforced priority education network (REP+)
city_school Indicates whether the school is part of a city school
QPV Position relative to a priority neighbourhood of the city policy. It is a policy aimed at compensating for differences in living standards with the rest of the territory.
ULIS “Indicates whether the school offers a ULIS (Localized Unit for School Inclusion)”
SEGPA “Indicates whether the school has a SEPGA (adapted general and vocational education sections)”
sport_section Indicates whether the school has a sports section
agricultural_high_school Indicates whether the school is an agricultural high school
military_high_school Indicates whether the school is a military high school
vocational_high_school Indicates whether the establishment is labeled “vocational high school”
establishment_web Url of the description of the establishment page on the ONISEP website
SIREN_SIRET “SIREN/SIRET number of the establishment. SIREN is for Business Register Identification System in french.”
district Name of the district to which the school is attached
ministry Ministry responsible for the institution
label_start_date Start date of the “generation 2024” label. Format yyyy/mm/dd
label_end_date End date of the “generation 2024” label. Format yyyy/mm/dd
y_coordinate Y coordinate of the establishment, using the EPSG coordinate system
x_coordinate X coordinate of the establishment, using the EPSG coordinate system
epsg EPSG code of the coordinate system used to locate the establishment
precision_on_localisation Specification of the geographical location of the establishment
latitude Latitude
longitude Longitude
position Geographical position
engaging_30_sport Indicates whether the institution participates in the 30 minutes of daily physical activity programme

Source of the data set

2.3 Student housing Data set

This dataset records enrolment in secondary schools according to the type of accommodation for pupils: half-board, boarding school etc. This data set contains 32’096 observations.

Variable Meaning
year_back_to_school Year of the start of the school year
Academic_region Name of the academic region
academy Name of the academy
department Name of the department
municipality Name of the municipality
number School identification number
establishment_main_name Main name of the establishment
establishment_name Name of the establishment
school_type School type
education_sector Education sector categorised as public or private
students_secondary_education Students in secondary education
students_higher_education Number of students in higher education
external_students_secondary_education External students in secondary education
half_boarders_students_secondary_education Half-boarders in secondary education
boarding_students_secondary_education Boarding students in secondary education
external_students_higher_education External students in higher education
half_board_students_higher_education Half-board students in higher education
boarding_students_higher_education Boarding students in higher education

Source of the data set

2.4 Single-parent families dataset

This data set provides information about the single-parent families in each municipality. The census has been made every five years since 2008. This data set contains 104’986 observations.

Variable Meaning
geocode Geographical code from INSEE
municipality Name of the municipality
year Census year
sing_par Single-parent families

Source of the data set

2.5 Covid Data set

This is a time based data set that gives us information on the COVID tests and results carried out by laboratories, hospitals, pharmacists, doctors and nurses. It is updated daily. On the 11th October, the data set contained 543’974 observations.

Variable Meaning
department_code Department code
test_week Date of the tests. Format yyyy-mm-dd-yyyy-mm-dd
educational_level Description of the age group as [m-n], m and n being the lower and upper limits.
age_group Denomination of the age group. n-1 is used in this case excepet for the oldest group where 18 is used
pop Population
positive Daily patients testing positive
tested Daily patients tested
incidence_rate Incidence rate
positivity_rate Positivity rate
screening_rate Screening rate

Source of the data set

Loading of the data

DNB_par_etablissement <- read_delim(here::here("data/DNB-par-etablissement.csv"), ";", escape_double = FALSE, trim_ws = TRUE)
Etablissements_labellises_generation_2024 <- read_delim(here::here("data/Etablissements-labellises-generation-2024.csv"),";", escape_double = FALSE, trim_ws = TRUE)
Hebergement_eleves_etablissements_2d <- read_delim(here::here("data/Hebergement-eleves-etablissements-2d.csv"), ";", escape_double = FALSE, trim_ws = TRUE)
Covid_sp_dep_7j_cage_scol_2022_10_10_19h02 <- read_delim(here::here("data/Covid-sp-dep-7j-cage-scol-2022-10-10-19h02.csv"), ";", escape_double = FALSE, trim_ws = TRUE)

3 Data Wrangling

We have realised that some wrangling are necessary for each data sets. We have established a checklist that we will go through for each data set. We have to :

  1. Translate the column names. As we have to rename all data sets in the same way, we have created a function. The function has a data frame and a vector as inputs. It checks if the length of the vector is correct, if so it returns a tibble with the column names renamed. Otherwise, the function gives an error message stating that the vector is not the right length.

rename_df <- function(df, x){
  if (ncol(df) == length(x)){
    names(df) <- c(x)
    df <- as_tibble(df)
  } else {
    stop("Vector is not the right length")
  }
}
  1. Make sure that all data are of the right type.
  2. Make sure that the time reference (year) are all aligned with the exam session.
  3. Add a column department_fr which will be harmonized between all data set in order to join them easily. We have also decided to keep only establishments in mainland France and Corsica. We wanted to focus on this part of France in order to get more comparable results. We note that including the results from all the overseas region would be interesting for further researches.
  4. and more. We need to make sure that the data set does not need any further specific wrangling.

3.1 dnb_results

  1. Translate the column names.

dnb_colnames <- c("session", "school_id", "school_type", "establishment_name", "education_sector", "municipality_code", "municipality", "department_code", "department", "academy_code", "academy_name", "region_code", "region", "registered", "present", "admitted", "admitted_without", "admitted_AB", "admitted_B", "admitted_TB", "success_rate_pct"
)
dnb_results <- rename_df(DNB_par_etablissement, dnb_colnames)
  1. success_rate is of the form xx,xx% we want it as a double of the form xx.xx

dnb_results[["success_rate_pct"]] <- as.double(gsub("%","",
                                               gsub(",",".", dnb_results[["success_rate_pct"]])))
  1. We need to harmonize the year variables of all the other data sets to match the logic of this one. The year is the year of the exam session (e.g academic period “2020-2021” is represented as 2021)
  2. We need to add the column department_fr and drop the overseas collectivities (COM).

dnb_results$department_fr <- stri_trans_general(dnb_results$department, "Latin-ASCII") %>%
  str_to_title(.) %>% 
  gsub("Du", "du", .) %>% 
  gsub("De", "de", .) %>% 
  gsub("D'", "D", .) %>%
  gsub("Et", "et", .) %>%
  gsub(" ", "-", .) %>%
  str_replace_all("Corse-du-Sud", "Corse du Sud") %>% 
  str_replace_all("deux-Sevres", "Deux-Sevres") %>% 
  str_replace_all("Alpes-de-Hte-Provence", "Alpes-de-Haute-Provence") %>%       
  str_replace_all("Territoire-de-Belfort", "Territoire de Belfort") %>% 
  str_replace_all("Seine-Saint-denis", "Seine-Saint-Denis")

dnb_results <- dnb_results %>% 
dplyr::filter(!department_fr %in% c("Polynesie-Française","Guyane", "Martinique", "Guadeloupe", "La-Reunion", "Mayotte", "NA", "-"))
  1. We want some to know the attribution rate of each mention. It is needed in order to do some true comparisons between region without the same number of students.
dnb_results <- dnb_results %>% 
  mutate(without_pct = admitted_without/admitted*100,
         AB_pct = admitted_AB/admitted*100,
         B_pct = admitted_B/admitted*100,
         TB_pct = admitted_TB/admitted*100
         )

We can see the final table dnb_results below.

3.2 establishment_24

  1. Translate the column names.
est_24_names <- c("region", "academy", "department", "municipality", "establishment", "school_id", "school_type", "education_sector", "postcode", "adress", "adress_2", "mail", "students", "priority_education", "city_school", "QPV", "ULIS", "SEGPA", "sport_section", "agricultural_high_school", "military_high_school", "vocational_high_school", "establishment_web", "SIREN_SIRET", "district", "ministry", "label_start_date", "label_end_date", "y_coordinate", "x_coordinate", "epsg", "precision_on_localisation", "latitude", "longitude", "position", "engaging_30_sport")
establishment_24 <- rename_df(Etablissements_labellises_generation_2024, est_24_names)
  1. No problem for this data set
  2. We need to add two variables session_started and session_ended. Indeed as the label has a start and an End date we have to trace the first session and the last session where the establishment have the label generation 2024.Most labellisations start and end in January but a few start and end in middle of the year. Exams take place end of June, beginning of July. Therefore, we will consider labellization done in August and after as done for the next academic year.

establishment_24 <- establishment_24 %>% 
  mutate(session_started = case_when(month(label_start_date) <= 7 ~ year(label_start_date),
                                     month(label_start_date) >  7 ~ year(label_start_date)+1),
         session_ended = case_when(month(label_end_date) <= 7 ~ year(label_end_date),
                                   month(label_end_date) >  7 ~ year(label_end_date)+1)
         )
  1. We need to add the column department_fr.

establishment_24$department_fr <- stri_trans_general(establishment_24$department, "Latin-ASCII") %>%
  str_to_title(.) %>% 
  gsub("Du", "du", .) %>% 
  gsub("De", "de", .) %>% 
  gsub("D'", "D", .) %>%
  gsub("Et", "et", .) %>%
  gsub(" ", "-", .) %>%
  str_replace_all("Corse-du-Sud", "Corse du Sud") %>% 
  str_replace_all("deux-Sevres", "Deux-Sevres") %>% 
  str_replace_all("Territoire-de-Belfort", "Territoire de Belfort") %>% 
  str_replace_all("Seine-Saint-denis", "Seine-Saint-Denis")
We can see on the map below, that the data set contains establishments from the overseas collectivities (COM) but from the French international schools as well.

As previsouly discussed we have decided to keep only data from mainland France. We have to make sure that we also remove the French international schools. We also take this opportunity to remove unused variables.

establishment_24 <- establishment_24 %>% 
  dplyr::filter(!department_fr %in% c("Polynesie-Francaise","Guyane", "Martinique", "Guadeloupe", "La-Reunion", "Mayotte", "Saint-Martin", "-")) %>% 
  dplyr::filter(!department_fr == "NA")#"NA" and "-" makes sure that we have no more International schools. 



establishment_24 <- establishment_24 %>% 
  select(-c(postcode:mail,city_school,QPV:SEGPA,establishment_web:ministry, precision_on_localisation))
  1. We create a high_school_type variable which contains the information of agricultural_high_school, military_high_school, vocational_high_school.

establishment_24 <- establishment_24 %>% 
  mutate(high_school_type = case_when(agricultural_high_school == 1 ~ "agricultural high school",
                                      military_high_school == 1 ~ "military high school",
                                      vocational_high_school == 1 ~ "vocational high school")) %>% 
    select(-c(agricultural_high_school, military_high_school, vocational_high_school))

We can see the final table establishment_24 below.

3.3 student_housing

  1. Translate the column names.

housing_names <- c("year_back_to_school", "Academic_region", "academy", "department", "municipality", "school_id", "establishment_main_name", "establishment_name", "school_type", "education_sector", "students_secondary_education", "students_higher_education", "external_students_secondary_education", "half_boarders_students_secondary_education", "boarding_students_secondary_education", "external_students_higher_education", "half_board_students_higher_education", "boarding_students_higher_education")
student_housing <- rename_df(Hebergement_eleves_etablissements_2d, housing_names)
  1. No Problem for this data set
  2. We need to create a session variable as year_back_to_school refers to the beginning of the school year and not the exam session.

student_housing <- student_housing %>% 
  mutate(session = year_back_to_school + 1) %>% 
    select(year_back_to_school,session, everything()) #here just to order variables
  1. We need to add the column department_fr.
student_housing$department_fr <- stri_trans_general(student_housing$department, "Latin-ASCII") %>%
  str_to_title(.) %>% 
  gsub("Du", "du", .) %>% 
  gsub("De", "de", .) %>% 
  gsub("D'", "D", .) %>%
  gsub("Et", "et", .) %>%
  gsub(" ", "-", .) %>%
  str_replace_all("Corse-du-Sud", "Corse du Sud") %>% 
  str_replace_all("deux-Sevres", "Deux-Sevres") %>% 
  str_replace_all("Alpes-de-Hte-Provence", "Alpes-de-Haute-Provence") %>%       
  str_replace_all("Territoire-de-Belfort", "Territoire de Belfort") %>% 
  str_replace_all("Seine-Saint-denis", "Seine-Saint-Denis")
  1. No need for further data wrangling for this data set We can see the final table student_housing below.

3.4 single_parent

  1. Translate the column names.
sg_parent_names <- c("geocode", "municipality", "year","sing_par")

3.5 covid_in_schools

  1. Translate the column names.
#1
covide_names <- c("department_code", "test_week", "educational_level", "age_group", "pop", "positive", "tested", "incidence_rate", "positivity_rate", "screening_rate")
covid_in_schools <- rename_df(Covid_sp_dep_7j_cage_scol_2022_10_10_19h02,covide_names)
  1. test_week will be treated in (3.). positive, incidence_rate and positivity_rate need to be doubles
covid_in_schools[["positive"]] <- as.double(gsub(",",".", covid_in_schools[["positive"]]))
covid_in_schools[["incidence_rate"]] <- as.double(gsub(",",".", covid_in_schools[["incidence_rate"]]))
covid_in_schools[["positivity_rate"]] <- as.double(gsub(",",".", covid_in_schools[["positivity_rate"]]))
  1. We need to create two new variables. The first will be the date categorizing each week. We chose the first date (Monday). The test for a session will be those from August to July of the next year. As our argument will be set on the month, we might have some test done the first days of august count towards the “wrong” session. The number of Covid cases in August are relatively low compared to the rest of the year and it represents at maximum 6 days of tests. Therefore we consider this margin of error to be satisfactory.
covid_in_schools <- covid_in_schools %>% 
  mutate(test_date = ymd(substr(test_week,1,10)),
         session = case_when(month(test_date) <= 7 ~ year(test_date),
                             month(test_date) >  7 ~ year(test_date)+1))
  1. Only department code. We need to input the names of the departments.

  2. To simplify the dataset, we need to drop all but 11-15 educational_level.

We can see the final table covid_in_schools below.

3.6 Auxiliary data sets

We will use the ggplot France map for our visualizations

map <- map_data("france")

The region variable is in fact the departments. We rename it “department_fr” to fit with in the other data sets.

colnames(map)[5]<- "department_fr"

4 Exploratory data analysis

4.1 dnb_results

To explore this data set we have decided to start on a national level to analyse the global tendency. We will then go down a level to a regional analysis to compare the number of students and see which region performs better. An analysis at the regional level will then be performed to dig deeper into the success rate and the graduation rate for each mention. To be complete with our analysis, we will see the results by establishment for the best and worst performing establishments in 2020. We will use their results of 2006 in comparison.

4.1.1 National analysis

France_results <- dnb_results %>% #select(session,registered, present, contains("admitted")) %>% 
  group_by(session) %>% 
  summarise(registered = sum(registered),
            present = sum(present),
            admitted = sum(admitted),
            admitted_without = sum(admitted_without),
            admitted_AB = sum(admitted_AB),
            admitted_B = sum(admitted_B),
            admitted_TB = sum(admitted_TB),
            without_pct = mean(without_pct, na.rm = TRUE),
            AB_pct = mean(AB_pct, na.rm = TRUE),
            B_pct = mean(B_pct, na.rm = TRUE),
            TB_pct = mean(TB_pct, na.rm = TRUE),
            success_rate_pct = mean(success_rate_pct, na.rm = TRUE)) %>% 
  pivot_longer(c(registered, present,contains("admitted")),
               names_to = "Candidates",
               values_to = "Number_of_students") %>% 
   pivot_longer(c(contains("pct")), 
               names_to = "Mention_type",
               values_to = "Rate")

The graph below was made by first grouping data by session. A sum was then applied to summarize the variables.


p <- France_results %>% 
  ggplot(aes(x = session, y = Number_of_students, group = Candidates, color = Candidates))+
  geom_line()+
  scale_color_viridis(discrete = TRUE) +
    ggtitle("National DNB statistics") +
    theme_ipsum() +
    ylab("Number of students")

ggplotly(p, tooltip = c("x" ,"y"))

We can notice that the

National DNB statistics (rate)

4.1.2 Regional Analysis

Success rate

4.1.2.1 Number of students per result and region during the period 2006-2021

4.1.2.1.1 Number of admitted by region
4.1.2.1.2 Admitted with zero mention
4.1.2.1.3 Admitted with mention AB
4.1.2.1.4 Admitted with mention B
4.1.2.1.5 Admitted with mention TB

####{-}

4.1.2.2 Period 2006-2021 by region

4.1.2.2.1 Success rate
4.1.2.2.2 Admitted with zero mention
4.1.2.2.3 Admitted with mention AB
4.1.2.2.4 Admitted with mention B
4.1.2.2.5 Admitted with mention TB

####{-}

4.1.3 Departmental Analysis

4.1.3.1 Box plot Analysis per department

4.1.3.1.1 success_rate_pct
4.1.3.1.2 without_pct
4.1.3.1.3 B_pct
4.1.3.1.4 AB_pct
4.1.3.1.5 TB_pct

4.1.3.2 Box plot Analysis of at the Department level

4.1.3.2.1 Paris 2020

First analysis of the best performing highest rate of TB -> Paris 2020

#> Warning: The following aesthetics were dropped during statistical
#> transformation: text2
#> i This can happen when ggplot fails to infer the correct grouping
#>   structure in the data.
#> i Did you forget to specify a `group` aesthetic or to convert a
#>   numerical variable into a factor?
4.1.3.2.2 Paris 2006

First analysis of the best performing highest rate of TB -> Paris 2020

#> Warning: The following aesthetics were dropped during statistical
#> transformation: text2
#> i This can happen when ggplot fails to infer the correct grouping
#>   structure in the data.
#> i Did you forget to specify a `group` aesthetic or to convert a
#>   numerical variable into a factor?
4.1.3.2.3 Eure et Loir 2020

Lowest performing department in TB rate and “best” in zero mention rate -> Guyane 2006

#> Warning: The following aesthetics were dropped during statistical
#> transformation: text2
#> i This can happen when ggplot fails to infer the correct grouping
#>   structure in the data.
#> i Did you forget to specify a `group` aesthetic or to convert a
#>   numerical variable into a factor?
4.1.3.2.4 Eure et Loir 2006

Lowest performing department in TB rate and “best” in zero mention rate -> Guyane 2006

#> Warning: The following aesthetics were dropped during statistical
#> transformation: text2
#> i This can happen when ggplot fails to infer the correct grouping
#>   structure in the data.
#> i Did you forget to specify a `group` aesthetic or to convert a
#>   numerical variable into a factor?

4.2 Establishment_24

The

4.2.1 esssai carte

Creation of the map theme


map_theme <- theme(title=element_text(),
                   plot.title=element_text(margin=margin(20,20,20,20), size=18, hjust = 0.5),
                   axis.text.x=element_blank(),
                   axis.text.y=element_blank(),
                   axis.ticks=element_blank(),
                   axis.title.x=element_blank(),
                   axis.title.y=element_blank(),
                   panel.grid.major= element_blank(), 
                   panel.background= element_blank()) 

Creation of the dataset used for the map


result <- dnb_results %>% 
  select(department_fr, success_rate_pct) %>% 
  group_by(department_fr) %>% 
  summarise(success_rate = mean(success_rate_pct))

Join the map from ggplot and our new dataset


result_map <- left_join(x = map[,-6], y = result)

plot

  • Mapping out the underlying structure
  • Identifying the most important variables
  • Univariate visualizations
  • Multivariate visualizations
  • Summary tables

5 Analysis

regression Essai

lm1 <- lm(dnb_results$TB_pct ~ dnb_results$without_pct + dnb_results$B_pct + dnb_results$AB_pct + dnb_results$without_pct)
summary(lm1)
#> 
#> Call:
#> lm(formula = dnb_results$TB_pct ~ dnb_results$without_pct + dnb_results$B_pct + 
#>     dnb_results$AB_pct + dnb_results$without_pct)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -1.71e-09  0.00e+00  0.00e+00  0.00e+00  2.55e-11 
#> 
#> Coefficients:
#>                          Estimate Std. Error   t value Pr(>|t|)    
#> (Intercept)              1.00e+02   1.17e-13  8.53e+14   <2e-16 ***
#> dnb_results$without_pct -1.00e+00   1.26e-15 -7.95e+14   <2e-16 ***
#> dnb_results$B_pct       -1.00e+00   2.25e-15 -4.44e+14   <2e-16 ***
#> dnb_results$AB_pct      -1.00e+00   1.50e-15 -6.66e+14   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 4.74e-12 on 135154 degrees of freedom
#>   (29 observations deleted due to missingness)
#> Multiple R-squared:     1,   Adjusted R-squared:     1 
#> F-statistic: 3.04e+29 on 3 and 135154 DF,  p-value: <2e-16

cluster Essai

dnb_pct_dep <- dnb_results %>%
  group_by(department, session) %>% 
  summarise(AB_pct_dep = mean(AB_pct, na.rm = TRUE),
            B_pct_dep = mean(B_pct, na.rm = TRUE),
            TB_pct_dep = mean(TB_pct, na.rm = TRUE),
            without_pct_dep = mean(without_pct, na.rm = TRUE),
            success_rate_pct_dep = mean(success_rate_pct, na.rm = TRUE))

pairs(dnb_pct_dep[2:6])

distance <- dist(dnb_pct_dep)
#> Warning in dist(dnb_pct_dep): NAs introduced by coercion

mydata.hclust <- hclust(distance)
plot(mydata.hclust)

  • Answers to the research questions
  • Different methods considered
  • Competing approaches
  • Justifications

6 Conclusion

  • Take home message
  • Limitations
  • Future work?